Video coding methods and apparatuses

ABSTRACT

Video coding methods and apparatuses are provided that make use of various models and/or modes to significantly improve coding efficiency especially for high/complex motion sequences. The methods and apparatuses take advantage of the temporal and/or spatial correlations that may exist within portions of the frames, e.g., at the Macroblock level, etc. The methods and apparatuses tend to significantly reduce the amount of data required for encoding motion information while retaining or even improving video image quality.

RELATED PATENT APPLICATIONS

[0001] This U.S. Non-provisional Application for Letters Patent claimsthe benefit of priority from, and hereby incorporates by reference theentire disclosure of, co-pending U.S. Provisional Application forLetters Patent Serial No. 60/376,005, filed Apr. 26, 2002, and titled“Video Coding Methods and Arrangements”.

[0002] This U.S. Non-provisional Application for Letters Patent furtherclaims the benefit of priority from, and hereby incorporates byreference the entire disclosure of, co-pending U.S. ProvisionalApplication for Letters Patent Serial No. 60/352,127, filed Jan. 15,2002.

TECHNICAL FIELD

[0003] This invention relates to video coding, and more particularly tomethods and apparatuses for providing improved coding and/or predictiontechniques associated with different types of video data.

BACKGROUND

[0004] The motivation for increased coding efficiency in video codinghas led to the adoption in the Joint Video Team (JVT) (a standards body)of more refined and complicated models and modes describing motioninformation for a given macroblock. These models and modes tend to makebetter advantage of the temporal redundancies that may exist within avideo sequence. See, for example, ITU-T, Video Coding Expert Group(VCEG), “JVT Coding—(ITU-T H.26L & ISO/IEC JTC1 Standard)—Working DraftNumber 2 (WD-2)”, ITU-T JVT-B118, March 2002; and/or Heiko Schwarz andThomas Wiegand, “Tree-structured macroblock partition”, Doc. VCEG-N17,December 2001.

[0005] The recent models include, for example, multi-frame indexing ofthe motion vectors, increased sub-pixel accuracy, multi-referencing, andtree structured macroblock and motion assignment, according to whichdifferent sub areas of a macroblock are assigned to different motioninformation. Unfortunately these models tend to also significantlyincrease the required percentage of bits for the encoding of motioninformation within sequence. Thus, in some cases the models tend toreduce the efficacy of such coding methods.

[0006] Even though, in some cases, motion vectors are differentiallyencoded versus a spatial predictor, or even skipped in the case of zeromotion while having no residue image to transmit, this does not appearto be sufficient for improved efficiency.

[0007] It would, therefore, be advantageous to further reduce the bitsrequired for the encoding of motion information, and thus of the entiresequence, while at the same time not significantly affecting quality.

[0008] Another problem that is also introduced by the adoption of suchmodels and modes is that of determining the best mode among all possiblechoices, for example, given a goal bitrate, encoding/quantizationparameters, etc. Currently, this problem can be partially solved by theuse of cost measures/penalties depending on the mode and/or thequantization to be used, or even by employing Rate DistortionOptimization techniques with the goal of minimizing a Lagrangianfunction.

[0009] Such problems and others become even more significant, however,in the case of Bidirectionally Predictive (B) frames where a macroblockmay be predicted from both future and past frames. This essentiallymeans that an even larger percentage of bits may be required for theencoding of motion vectors.

[0010] Hence, there is a need for improved method and apparatuses foruse in coding (e.g., encoding and/or decoding) video data.

SUMMARY

[0011] Video coding methods and apparatuses are provided that make useof various models and/or modes to significantly improve codingefficiency especially for high/complex motion sequences. The methods andapparatuses take advantage of the temporal and/or spatial correlationsthat may exist within portions of the frames, e.g., at the Macroblocklevel, etc. The methods and apparatuses tend to significantly reduce theamount of data required for encoding motion information while retainingor even improving video image quality.

[0012] Thus, by way of example, in accordance with certainimplementations of the present invention, a method for use in encodingvideo data within a sequence of video frames is provided. The methodincludes encoding at least a portion of a reference frame to includemotion information associated with the portion of the reference frame.The method further includes defining at least a portion of at least onepredictable frame that includes video data predictively correlated tothe portion of the reference frame based on the motion information, andencoding at least the portion of the predictable frame without includingcorresponding motion information, but including mode identifying datathat identifies that the portion of the predictable frame can bedirectly derived using the motion information associated with theportion of the reference frame.

[0013] An apparatus for use in encoding video data for a sequence ofvideo frames into a plurality of video frames including at least onepredictable frame is also provided. Here, for example, the apparatusincludes memory and logic, wherein the logic is configured to encode atleast a portion of at least one reference frame to include motioninformation associated with the portion of the reference frame. Thelogic also determines at least a portion of at least one predictableframe that includes video data predictively correlated to the portion ofthe reference frame based on the motion information, and encodes atleast the portion of the predictable frame such that mode identifyingdata is provided to specify that the portion of the predictable framecan be derived using the motion information associated with the portionof the reference frame.

[0014] In accordance with still other exemplary implementations, amethod is provided for use in decoding encoded video data that includesat least one predictable video frame. The method includes determiningmotion information associated with at least a portion of at least onereference frame and buffering the motion information. The method alsoincludes determining mode identifying data that identifies that at leasta portion of a predictable frame can be directly derived using at leastthe buffered motion information, and generating the portion of thepredictable frame using the buffered motion information.

[0015] An apparatus is also provided for decoding video data. Theapparatus includes memory and logic, wherein the logic is configured tobuffer in the memory motion information associated with at least aportion of at least one reference frame, ascertain mode identifying datathat identifies that at least a portion of a predictable frame can bedirectly derived using at least the buffered motion information, andgenerate the portion of the predictable frame using the buffered motioninformation.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] The present invention is illustrated by way of example and notlimitation in the figures of the accompanying drawings. The same numbersare used throughout the figures to reference like components and/orfeatures.

[0017]FIG. 1 is a block diagram depicting an exemplary computingenvironment that is suitable for use with certain implementations of thepresent invention.

[0018]FIG. 2 is a block diagram depicting an exemplary representativedevice that is suitable for use with certain implementations of thepresent invention.

[0019]FIG. 3 is an illustrative diagram depicting a Direct MotionProjection technique suitable for use in B Frame coding, in accordancewith certain exemplary implementations of the present invention.

[0020]FIG. 4 is an illustrative diagram depicting a Direct P and Bcoding techniques within a sequence of video frames, in accordance withcertain exemplary implementations of the present invention.

[0021]FIG. 5 is an illustrative diagram depicting Direct MotionPrediction for collocated macroblocks having identical motioninformation, in accordance with certain exemplary implementations of thepresent invention.

[0022]FIG. 6 is an illustrative diagram depicting the usage ofacceleration information in Direct Motion Projection, in accordance withcertain exemplary implementations of the present invention.

[0023]FIG. 7 is an illustrative diagram depicting a Direct PixelProjection technique suitable for use in B Frame coding, in accordancewith certain exemplary implementations of the present invention.

[0024]FIG. 8 is an illustrative diagram depicting a Direct PixelProjection technique suitable for use in P Frame coding, in accordancewith certain exemplary implementations of the present invention.

[0025]FIG. 9 is a block diagram depicting an exemplary conventionalvideo encoder.

[0026]FIG. 10 is a block diagram depicting an exemplary conventionalvideo decoder.

[0027]FIG. 11 is a block diagram depicting an exemplary improved videoencoder using Direct Prediction, in accordance with certain exemplaryimplementations of the present invention.

[0028]FIG. 12 is a block diagram depicting an exemplary improved videodecoder using Direct Prediction, in accordance with certain exemplaryimplementations of the present invention.

[0029]FIG. 13 is an illustrative diagram depicting a Direct Pixel/BlockProjection technique, in accordance with certain exemplaryimplementations of the present invention.

[0030]FIG. 14 is an illustrative diagram depicting a Direct MotionProjection technique suitable for use in B Frame coding, in accordancewith certain exemplary implementations of the present invention.

[0031]FIG. 15 is an illustrative diagram depicting motion vectorpredictions, in accordance with certain exemplary implementations of thepresent invention.

[0032]FIG. 16 is an illustrative diagram depicting interlace codingtechniques for P frames, in accordance with certain exemplaryimplementations of the present invention.

[0033]FIG. 17 is an illustrative diagram depicting interlace codingtechniques for B frames, in accordance with certain exemplaryimplementations of the present invention.

[0034]FIG. 18 is an illustrative diagram depicting interlace codingtechniques using frame and field based coding, in accordance withcertain exemplary implementations of the present invention.

[0035]FIG. 19 is an illustrative diagram depicting a scheme for codingjoint field/frame images, in accordance with certain exemplaryimplementations of the present invention.

DETAILED DESCRIPTION

[0036] In accordance with certain aspects of the present invention,methods and apparatuses are provided for coding (e.g., encoding and/ordecoding) video data. The methods and apparatuses can be configured toenhance the coding efficiency of “interlace” or progressive video codingstreaming technologies. In certain implementations, for example, withregard to the current H.26L standard, so called “P-frames” have beensignificantly enhanced by introducing several additional macroblockModes. In some cases it may now be necessary to transmit up to 16 motionvectors per macroblock. Certain aspects of the present invention providea way of encoding these motion vectors. For example, as described below,Direct P prediction techniques can be used to select the motion vectorsof collocated pixels in the previous frame.

[0037] While these and other exemplary methods and apparatuses aredescribed, it should be kept in mind that the techniques of the presentinvention are not limited to the examples described and shown in theaccompanying drawings, but are also clearly adaptable to other similarexisting and future video coding schemes, etc.

[0038] Before introducing such exemplary methods and apparatuses, anintroduction is provided in the following section for suitable exemplaryoperating environments, for example, in the form of a computing deviceand other types of devices/appliances.

[0039] Exemplary Operational Environments:

[0040] Turning to the drawings, wherein like reference numerals refer tolike elements, the invention is illustrated as being implemented in asuitable computing environment. Although not required, the inventionwill be described in the general context of computer-executableinstructions, such as program modules, being executed by a personalcomputer.

[0041] Generally, program modules include routines, programs, objects,components, data structures, etc. that perform particular tasks orimplement particular abstract data types. Those skilled in the art willappreciate that the invention may be practiced with other computersystem configurations, including hand-held devices, multi-processorsystems, microprocessor based or programmable consumer electronics,network PCs, minicomputers, mainframe computers, portable communicationdevices, and the like.

[0042] The invention may also be practiced in distributed computingenvironments where tasks are performed by remote processing devices thatare linked through a communications network. In a distributed computingenvironment, program modules may be located in both local and remotememory storage devices.

[0043]FIG. 1 illustrates an example of a suitable computing environment120 on which the subsequently described systems, apparatuses and methodsmay be implemented. Exemplary computing environment 120 is only oneexample of a suitable computing environment and is not intended tosuggest any limitation as to the scope of use or functionality of theimproved methods and systems described herein. Neither should computingenvironment 120 be interpreted as having any dependency or requirementrelating to any one or combination of components illustrated incomputing environment 120.

[0044] The improved methods and systems herein are operational withnumerous other general purpose or special purpose computing systemenvironments or configurations. Examples of well known computingsystems, environments, and/or configurations that may be suitableinclude, but are not limited to, personal computers, server computers,thin clients, thick clients, hand-held or laptop devices, multiprocessorsystems, microprocessor-based systems, set top boxes, programmableconsumer electronics, network PCs, minicomputers, mainframe computers,distributed computing environments that include any of the above systemsor devices, and the like.

[0045] As shown in FIG. 1, computing environment 120 includes ageneral-purpose computing device in the form of a computer 130. Thecomponents of computer 130 may include one or more processors orprocessing units 132, a system memory 134, and a bus 136 that couplesvarious system components including system memory 134 to processor 132.

[0046] Bus 136 represents one or more of any of several types of busstructures, including a memory bus or memory controller, a peripheralbus, an accelerated graphics port, and a processor or local bus usingany of a variety of bus architectures. By way of example, and notlimitation, such architectures include Industry Standard Architecture(ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA)bus, Video Electronics Standards Association (VESA) local bus, andPeripheral Component Interconnects (PCI) bus also known as Mezzaninebus.

[0047] Computer 130 typically includes a variety of computer readablemedia. Such media may be any available media that is accessible bycomputer 130, and it includes both volatile and non-volatile media,removable and non-removable media.

[0048] In FIG. 1, system memory 134 includes computer readable media inthe form of volatile memory, such as random access memory (RAM) 140,and/or non-volatile memory, such as read only memory (ROM) 138. A basicinput/output system (BIOS) 142, containing the basic routines that helpto transfer information between elements within computer 130, such asduring start-up, is stored in ROM 138. RAM 140 typically contains dataand/or program modules that are immediately accessible to and/orpresently being operated on by processor 132.

[0049] Computer 130 may further include other removable/non-removable,volatile/non-volatile computer storage media. For example, FIG. 1illustrates a hard disk drive 144 for reading from and writing to anon-removable, non-volatile magnetic media (not shown and typicallycalled a “hard drive”), a magnetic disk drive 146 for reading from andwriting to a removable, non-volatile magnetic disk 148 (e.g., a “floppydisk”), and an optical disk drive 150 for reading from or writing to aremovable, non-volatile optical disk 152 such as a CD-ROM/R/RW,DVD-ROM/R/RW/+R/RAM or other optical media. Hard disk drive 144,magnetic disk drive 146 and optical disk drive 150 are each connected tobus 136 by one or more interfaces 154.

[0050] The drives and associated computer-readable media providenonvolatile storage of computer readable instructions, data structures,program modules, and other data for computer 130. Although the exemplaryenvironment described herein employs a hard disk, a removable magneticdisk 148 and a removable optical disk 152, it should be appreciated bythose skilled in the art that other types of computer readable mediawhich can store data that is accessible by a computer, such as magneticcassettes, flash memory cards, digital video disks, random accessmemories (RAMs), read only memories (ROM), and the like, may also beused in the exemplary operating environment.

[0051] A number of program modules may be stored on the hard disk,magnetic disk 148, optical disk 152, ROM 138, or RAM 140, including,e.g., an operating system 158, one or more application programs 160,other program modules 162, and program data 164.

[0052] The improved methods and systems described herein may beimplemented within operating system 158, one or more applicationprograms 160, other program modules 162, and/or program data 164.

[0053] A user may provide commands and information into computer 130through input devices such as keyboard 166 and pointing device 168 (suchas a “mouse”). Other input devices (not shown) may include a microphone,joystick, game pad, satellite dish, serial port, scanner, camera, etc.These and other input devices are connected to the processing unit 132through a user input interface 170 that is coupled to bus 136, but maybe connected by other interface and bus structures, such as a parallelport, game port, or a universal serial bus (USB).

[0054] A monitor 172 or other type of display device is also connectedto bus 136 via an interface, such as a video adapter 174. In addition tomonitor 172, personal computers typically include other peripheraloutput devices (not shown), such as speakers and printers, which may beconnected through output peripheral interface 175.

[0055] Computer 130 may operate in a networked environment using logicalconnections to one or more remote computers, such as a remote computer182. Remote computer 182 may include many or all of the elements andfeatures described herein relative to computer 130.

[0056] Logical connections shown in FIG. 1 are a local area network(LAN) 177 and a general wide area network (WAN) 179. Such networkingenvironments are commonplace in offices, enterprise-wide computernetworks, intranets, and the Internet.

[0057] When used in a LAN networking environment, computer 130 isconnected to LAN 177 via network interface or adapter 186. When used ina WAN networking environment, the computer typically includes a modem178 or other means for establishing communications over WAN 179. Modem178, which may be internal or external, may be connected to system bus136 via the user input interface 170 or other appropriate mechanism.

[0058] Depicted in FIG. 1, is a specific implementation of a WAN via theInternet. Here, computer 130 employs modem 178 to establishcommunications with at least one remote computer 182 via the Internet180.

[0059] In a networked environment, program modules depicted relative tocomputer 130, or portions thereof, may be stored in a remote memorystorage device. Thus, e.g., as depicted in FIG. 1, remote applicationprograms 189 may reside on a memory device of remote computer 182. Itwill be appreciated that the network connections shown and described areexemplary and other means of establishing a communications link betweenthe computers may be used.

[0060] Attention is now drawn to FIG. 2, which is a block diagramdepicting another exemplary device 200 that is also capable ofbenefiting from the methods and apparatuses disclosed herein. Device 200is representative of any one or more devices or appliances that areoperatively configured to process video and/or any related types of datain accordance with all or part of the methods and apparatuses describedherein and their equivalents. Thus, device 200 may take the form of acomputing device as in FIG. 1, or some other form, such as, for example,a wireless 11 device, a portable communication device, a personaldigital assistant, a video player, a television, a DVD player, a CDplayer, a karaoke machine, a kiosk, a digital video projector, a flatpanel video display mechanism, a set-top box, a video game machine, etc.In this example, device 200 includes logic 202 configured to processvideo data, a video data source 204 configured to provide vide data tologic 202, and at least one display module 206 capable of displaying atleast a portion of the video data for a user to view. Logic 202 isrepresentative of hardware, firmware, software and/or any combinationthereof. In certain implementations, for example, logic 202 includes acompressor/decompressor (codec), or the like. Video data source 204 isrepresentative of any mechanism that can provide, communicate, output,and/or at least momentarily store video data suitable for processing bylogic 202. Video reproduction source is illustratively shown as beingwithin and/or without device 200. Display module 206 is representativeof any mechanism that a user might view directly or indirectly and seethe visual results of video data presented thereon. Additionally, incertain implementations, device 200 may also include some form orcapability for reproducing or otherwise handling audio data associatedwith the video data. Thus, an audio reproduction module 208 is shown.

[0061] With the examples of FIGS. 1 and 2 in mind, and others like them,the next sections focus on certain exemplary methods and apparatusesthat may be at least partially practiced using with such environmentsand with such devices.

[0062] Direct Prediction for Predictive (P) and BidirectionallyPredictive (B) Frames in Video Coding:

[0063] This section presents a new highly efficient Inter Macroblocktype that can significantly improve coding efficiency especially forhigh/complex motion sequences. This Inter Macroblock new type takesadvantage of the temporal and spatial correlations that may exist withinframes at the macroblock level, and as a result can significantly reducethe bits required for encoding motion information while retaining oreven improving quality.

[0064] Direct Prediction

[0065] The above mentioned problems and/or others are at least partiallysolved herein by the introduction of a “Direct Prediction Mode” wherein,instead of encoding the actual motion information, both forward and/orbackward motion vectors are derived directly from the motion vectorsused in the correlated macroblock of the subsequent reference frame.

[0066] This is illustrated, for example, in FIG. 3, which shows threevideo frames, namely a P frame 300, a B frame 302 and P frame 304,corresponding to times t, t+1, and t+2, respectively. Also illustratedin FIG. 3 are macroblocks within frames 300, 302 and 304 and exemplarymotion vector (MV) information. Here, the frames have x and ycoordinates associated with them. The motion vector information for Bframe 302 is predicted (here, e.g., interpolated) from the motion vectorinformation encoded for P frames 300 and 304. The exemplary technique isderived from the assumption that an object is moving with constantspeed, and thus making it possible to predict its current positioninside B frame 302 without having to transmit any motion vectors. Whilethis technique may reduce the bitrate significantly for a given quality,it may not always be applied.

[0067] Introduced herein, in accordance with certain implementations ofthe present invention, is a new Inter Macroblock type is provided thatcan effectively exploit spatial and temporal correlations that may existat the macroblock level and in particular with regard to the motionvector information of the macroblock. According to this new mode it ispossible that a current macroblock may have motion that can be directlyderived from previously decoded information (e.g., Motion Projection).Thus, as illustratively shown in FIG. 4, there may not be a need totransmit any motion vectors for a macroblock, but even for an entireframe. Here, a sequence 400 of video frames is depicted with solidarrows indicating coded relationships between frames and dashed linesindicating predictable macroblock relationships. Video frame 402 is an Iframe, video frames 404, 406, 410, and 412 are B frames, and videoframes 408 and 414 are P frames. In this example, if P frame 408 has amotion field described by {right arrow over (MF)}₄₀₆ the motion of thecollocated macroblocks in pictures 404, 406, and 414 is also highlycorrelated. In particular, assuming that speed is in general constant onthe entire frame and that frames 404 and 406 are equally spaced in timebetween frames 402 and 408, and also considering that for B frames bothforward and backward motion vectors could be used, the motion fields inframe 404 could be equal to {right arrow over (MF)}₄₀₄ ^(fw)=½×{rightarrow over (MF)}₄₀₆ and {right arrow over (MF)}₄₀₄ ^(bw)=−⅔×{right arrowover (MF)}₄₀₆ for forward and backward motion fields respectively.Similarly, for frame 408 the motion fields could be {right arrow over(MF)}₄₀₈ ^(fw)=⅔×{right arrow over (MF)}₄₀₆ and {right arrow over(MF)}₄₀₈ ^(bw) =−⅓×{right arrow over (MF)} ₄₀₆ for forward and backwardmotion vectors respectively. Since 414 and 406 are equally spaced, then,using the same assumption, the collocated macroblock could have motionvectors {right arrow over (MF)}₄₁₆={right arrow over (MF)}₄₀₆.

[0068] Similar to the Direct Mode in B frames, by again assuming thatspeed is constant, motion for a macroblock can be directly derived fromthe correlated macroblock of the reference frame. This is furtherillustrated in FIG. 6, for example, which shows three video frames,namely a P frame 600, a B frame 602 and P frame 604, corresponding totimes t, t+1, and t+2, respectively. Here, the illustrated collocatedmacroblocks have similar if not identical motion information.

[0069] It is even possible to consider acceleration for refining suchmotion parameters, for example, see FIG. 7. Here, for example, threeframes are shown, namely a current frame 704 at time t, and previousframes 702 (time t−1) and 700 (time t−2), with different accelerationinformation illustrated by different length motion vectors.

[0070] The process may also be significantly improved by, instead ofconsidering motion projection at the macroblock level, taking intoaccount that the pixels inside the previous image are possibly movingwith a constant speed or a constant acceleration (e.g., PixelProjection). As such, one may generate a significantly more accurateprediction of the current frame for B frame coding as illustrated, forexample, in FIG. 8, and for P frame coding as illustrated, for example,in FIG. 9. FIG. 8, for example, shows three video frames, namely a Pframe 800, a B frame 802 and P frame 804, corresponding to times t, t+1,and t+2, respectively. FIG. 9, for example, shows three video frames,namely a P frame 900, a B frame 902 and P frame 904, corresponding totimes t, t+1, and t+2, respectively.

[0071] In certain implementations it is also possible to combine bothmethods together for even better performance.

[0072] In accordance with certain further implementations, motion canalso be derived from spatial information, for example, using predictiontechniques employed for the coding of motion vectors from the motioninformation of the surrounding macroblocks. Additionally, performancecan also be further enhanced by combining these two different methods ina multi-hypothesis prediction architecture that does not require motioninformation to be transmitted. Consequently, such new macroblock typescan achieve significant bitrate reductions while achieving similar orimproved quality.

[0073] Exemplary Encoding Processes:

[0074]FIG. 10 illustrates an exemplary encoding environment 1000, havinga conventional block based video encoder 1002, wherein a video data 1004is provided to encoder 1002 and a corresponding encoded video databitstream is output.

[0075] Video data 1004 is provided to a summation module 1006, whichalso receives as an input, the output from a motion compensation (MC)module 1022. The output from summation module 1006 is provided to adiscrete cosine transform (DCT) module 1010. The output of DCT module1010 is provided as an input to a quantization module (QP) 1012. Theoutput of QP module 1012 is provided as an input to an inversequantization module (QP⁻¹) 1014 and as an input to a variable lengthcoding (VLC) module 1016. VLC module 1016 also receives as in input, anoutput from a motion estimation (ME) module 1008. The output of VLCmodule 1016 is an encoded video bitstream 1210.

[0076] The output of QP⁻¹ module 1014 is provided as in input to ininverse discrete cosine transform (DCT) module 1018. The output of 1018is provided as in input to a summation module 1020, which has as anotherinput, the output from MC module 1022. The output from summation module1020 is provided as an input to a loop filter module 1024. The outputfrom loop filter module 1024 is provided as an input to a frame buffermodule 1026. One output from frame buffer module 1026 is provided as aninput to ME module 1008, and another output is provided as an input toMC module 1022. Me module 1008 also receives as an input video data1004. An output from ME 1008 is proved as an input to MC module 1022.

[0077] In this example, MC module 1022 receives inputs from ME module1008. Here, ME is performed on a current frame against a referenceframe. ME can be performed using various block sizes and search ranges,after which a “best” parameter, using some predefined criterion forexample, is encoded and transmitted (INTER coding). The residueinformation is also coded after performing DCT and QP. It is alsopossible that in some cases that the performance of ME does not producea satisfactory result, and thus a macroblock, or even a subblock, couldbe INTRA encoded.

[0078] Considering that motion information could be quite costly, theencoding process can be modified as in FIG. 12, in accordance withcertain exemplary implementations of the present invention, to alsoconsider in a further process the possibility that the motion vectorsfor a macroblock could be temporally and/or spatially predicted frompreviously encoded motion information. Such decisions, for example, canbe performed using Rate Distortion Optimization techniques or other costmeasures. Using such techniques/modes it may not be necessary totransmit detailed motion information, because such may be replaced witha Direct Prediction (Direct P) Mode, e.g., as illustrated in FIG. 5.

[0079] Motion can be modeled, for example, in any of the followingmodels or their combinations: (1) Motion Projection (e.g., asillustrated in FIG. 3 for B frames and FIG. 6 for P frames); (2) PixelProjection (e.g., as illustrated in FIG. 8 for B frames and FIG. 9 for Pframes); (3) Spatial MV Prediction (e.g., median value of motion vectorsof collocated macroblocks); (4) Weighted average of Motion Projectionand Spatial Prediction; (5) or other like techniques.

[0080] Other prediction models (e.g. acceleration, filtering, etc.) mayalso be used. If only one of these models is to be used, then thisshould be common in both the encoder and the decoder. Otherwise, one mayuse submodes which will immediately guide the decoder as to which modelit should use. Those skilled in the art will also recognize thatmulti-referencing a block or macroblock is also possible using anycombination of the above models.

[0081] In FIG. 12, an improved video encoding environment 1200 includesa video encoder 1202 that receives video data 1004 and outputs acorresponding encoded video data bitstream.

[0082] Here, video encoder 1202 has been modified to include improvement1204. Improvement 1204 includes an additional motion vector (MV) buffermodule 1206 and a DIRECT decision module 1208. More specifically, asshown, MV buffer module 1206 is configured to receive as inputs, theoutput from frame buffer module 1026 and the output from ME module 1008.The output from MV buffer module 1206 is provided, along with the outputfrom ME module 1008, as an input to DIRECT decision module 1208. Theoutput from DIRECT decision module 1208 is then provided as an input toMC module 1022 along with the output from frame buffer module 1026.

[0083] For the exemplary architecture to work successfully, the MotionInformation from the previously coded frame is stored intact, which isthe purpose for adding MV buffer module 1206. MV buffer module 1206 canbe used to store motion vectors. In certain implementations. MV buffermodule 1206 may also store information about the reference frame usedand of the Motion Mode used. In the case of acceleration, for example,additional buffering may be useful for storing motion information of the2^(nd) or even N previous frames when, for example, a more complicatedmodel for acceleration is employed.

[0084] If a macroblock, subblock, or pixel is not associated with aMotion Vector (i.e., a macroblock is intra coded), then for such blockit is assumed that the Motion Vector used is (0, 0) and that only theprevious frame was used as reference.

[0085] If multi-frame referencing is used, one may select to use themotion information as is, and/or to interpolate the motion informationwith reference to the previous coded frame. This is essentially up tothe design, but also in practice it appears that, especially for thecase of (0, 0) motion vectors, it is less likely that the current blockis still being referenced from a much older frame.

[0086] One may combine Direct Prediction with an additional set ofMotion Information which is, unlike before, encoded as part of theDirect Prediction. In such a case the prediction can, for example, be amulti-hypothesis prediction of both the Direct Prediction and the MotionInformation.

[0087] Since there are several possible Direct Prediction submodes thatone may combine, such could also be combined within a multi-hypothesisframework. For example, the prediction from motion projection could becombined with that of pixel projection and/or spatial MV prediction.

[0088] Direct Prediction can also be used at the subblock level within amacroblock. This is already done for B frames inside the current H.26Lcodec, but is currently only using Motion Projection and not PixelProjection or their combinations.

[0089] For B frame coding, one may perform Direct Prediction from onlyone direction (forward or backward) and not always necessarily from bothsides. One may also use Direct Prediction inside the Bidirectional modeof B frames, where one of the predictions is using Direct Prediction.

[0090] In the case of Multi-hypothesis images, for example, it ispossible that a P frame is referencing to a future frame. Here, properscaling, and/or inversion of the motion information can be performedsimilar to B frame motion interpolation.

[0091] Run-length coding, for example, can also be used according towhich, if subsequent “equivalent” Direct P modes are used in coding aframe or slice, then these can be encoded using a run-lengthrepresentation.

[0092] DIRECT decision module 1208 essentially performs the decisionwhether the Direct Prediction mode should be used instead of thepre-existing Inter or Intra modes. By way of example, the decision maybe based on joint Rate/Distortion Optimization criteria, and/or alsoseparate bitrate or distortion requirements or restrictions.

[0093] It is also possible, in alternate implementations, that moduleDirect Prediction module 1208 precedes the ME module 1008. In such case,if Direct Prediction can provide immediately with a good enoughestimate, based on some predefined conditions, for the motionparameters, ME module 1008 could be completely by-passed, thus alsoconsiderably reducing the computation of the encoding.

[0094] Exemplary Decoding Processes:

[0095] Reference is now made to FIG. 11, which depicts an exemplaryconventional decoding environment 1100 having a video decoder 1102 thatreceives an encoded video data bitstream 1104 and outputs corresponding(decoded) video data 1120.

[0096] Encoded video data bitstream 1104 is provided as an input to avariable length decoding (VLD) module 1106. The output of VLD module1106 is provided as an input to a QP⁻¹ module 1108, and as an input toan MC module 1110. The output from QP⁻¹ module 1108 is provided as aninput to an IDCT module 1112. The output of IDCT module 1112 is providedas an input to a summation module 1114, which also receives as an inputan output from MC module 1110. The output from summation module 1114 isprovided as an input to a loop filter module 1116. The output of loopfilter module 1116 is provided to a frame buffer module 1118. An outputfrom frame buffer module 1118 is provided as an input to MC module 1110.Frame buffer module 1118 also outputs (decoded) video data 1120.

[0097] An exemplary improved decoder 1302 for use in a Direct Predictionenvironment 1300 further includes an improvement 1306. Here, as shown inFIG. 13, improved decoder 1302 receives encoded video data bitstream1210, for example, as output by improved video encoder 1202 of FIG. 12,and outputs corresponding video (decoded) video data 1304.

[0098] Improvement 1306, in this example, is operatively insertedbetween MC module 1110 and a VLD module 1106′. Improvement 1306 includesan MV buffer module 1308 that receives as an input, an output from VLDmodule 1106′. The output of MV buffer module 1308 is provided as aselectable input to a selection module 1312 of improvement 1306. A blockmode module 1310 is also provided in improvement 1306. Block mode module1310 receives as an input, an output from VLD module 1106′. An output ofblock mode module 1310 is provided as an input to VLD module 1106′, andalso as a controlling input to selection module 1312. An output from VLDmodule 1106′ is provided as a selectable input to selection module 1312.Selection module 1312 is configured to selectably provide either anoutput from MV buffer module 1308 or VLD module 1106′ as an input to MCmodule 1110.

[0099] With improvement 1306, for example, motion information for eachpixel can be stored, and if the mode of a macroblock is identified asthe Direct Prediction mode, then the stored motion information, and theproper Projection or prediction method is selected and used. It shouldbe noted that if Motion Projection is used only, then the changes in anexisting decoder are very minor, and the additional complexity that isadded on the decoder could be considered negligible.

[0100] If submodes are used, then improved decoder 1302 can, forexample, be configured to perform steps opposite to the prediction stepsthat improved encoder 1202 performs, in order to properly decode thecurrent macroblock.

[0101] Again non referenced pixels (such as intra blocks) may beconsidered as having zero motion for the motion storage.

[0102] Some Exemplary Schemes

[0103] Considering that there are several possible predictors that maybe immediately used with Direct Prediction, for brevity purposes in thisdescription a smaller subset of cases, which are not only ratherefficient but also simple to implement, are described in greater detail.In particular, the following models are examined in greaterdemonstrative detail:

[0104] (A) In this example, Motion Projection is the only mode used. Norun-length coding of Direct Modes is used, where as residue informationis also transmitted. A special modification of the motion parameters isperformed in the case that a zero motion vector is used. In such asituation, the reference frame for the Direct Prediction is always setto zero (e.g., previous encoded frame). Furthermore, intra coded blocksare considered as having zero motion and reference frame parameters.

[0105] (B) This example is like example (A) except that no residue istransmitted.

[0106] (C) This example is basically a combination of examples (A) and(B), in that if QP<n (e.g., n=24) then the residue is also encoded,otherwise no residue is transmitted.

[0107] (D) This example is an enhanced Direct Prediction scheme thatcombines three submodes, namely:

[0108] (1) Motion Projection ({right arrow over (MV)}_(MP));

[0109] (2) Spatial MV Prediction ({right arrow over (MV)}_(SP)); and

[0110] (3) A weighted average of these two cases$( \frac{\lbrack {{\overset{arrow}{MV}}_{MP} + {2*{\overset{arrow}{MV}}_{SP}}} \rbrack}{3} ).$

[0111] Wherein, residue is not transmitted for QP<n (e.g., n=24). Here,run-length coding is not used. The partitioning of the submodes can beset as follows: Submodes Code Spatial Predictor 0 Motion Projection 1Weighted Average 2

[0112] The best submode could be selected using a Rate DistortionOptimization process (best compromise between bitrate and quality).

[0113] (E) A combination of example (C) with Pixel Projection. Here, forexample, an average of two predictions for the Direct Prediction Mode.

[0114] (F) This is a combination of example (C) with Motion Copy R2 (seee.g., Jani Lainema and Marta Karczewicz, “Skip mode motioncompensation”, Doc. JVT-C027, May 2002, which is incorporated herein byreference) or the like. This case can be seen as an alternative of theusage of the Spatial MV Predictor used in example (D), with onedifference being that the spatial predictor, under certain conditions,completely replaces the zero skip mode, and that this example (F) can berun-length encoded thus being able to achieve more efficientperformance.

[0115] Motion Vector Prediction in Bidirectionally Predictive (B) Frameswith Regards to Direct Mode:

[0116] The current JVT standard appears to be quite unclear on how aDirect Mode coded macroblock or block should be considered in the motionvector prediction within Bidirectionally Predicted (B) frames. Instead,it appears that the current software considers a Direct Mode Macroblockor subblock as having a “different reference frame” and thus not used inthe prediction. Unfortunately, considering that there might still behigh correlation between the motion vectors of a Direct predicted blockwith its neighbors such a condition could considerably hinder theperformance of B frames and reduce their efficiency. This could alsoreduce the efficiency of error concealment algorithms when applied to Bframes.

[0117] In this section, exemplary alternative approaches are presented,which can improve the coding efficiency increase the correlation ofmotion vectors within B frames, for example. This is done by consideringa Direct Mode coded block essentially equivalent to a Bidirectionallypredicted block within the motion prediction phase.

[0118] Direct Mode Macroblocks or blocks (for example, in the case of8×8 sub-partitions) could considerably improve the efficacy ofBidirectionally Predicted (B) frames since they can effectively exploittemporal correlations of motion vector information of adjacent frames.The idea is essentially derived from temporal interpolation techniqueswhere the assumption is made that if a block has moved from a position(x+dx, y+dy) at time t to a position (x, y) at time t+2, then, by usingtemporal interpolation, at time t+1 the same block must have essentiallybeen at position:$( {{x + \frac{dx}{2}},{y + \frac{dy}{2}}} )$

[0119] This is illustrated, for example, in FIG. 14, which shows threeframes, namely, a P frame 1400, a B frame 1402 and P frame 1404,corresponding to times t, t+1, and t+2, respectively. The approachthough most often used in current encoding standards instead assumesthat the block at position (x, y) of frame at time t+1 most likely canbe found at positions: $\begin{matrix}{( {{x + \frac{dx}{2}},{y + \frac{dy}{2}}} )\quad {at}\quad {time}\quad t\quad {and}} \\{{( {{x - \frac{dx}{2}},{y - \frac{dy}{2}}} )\quad {at}\quad {time}\quad t} + 2.}\end{matrix}$

[0120] The later is illustrated in FIG. 15, which shows three frames,namely, a P frame 1500, a B frame 1502 and P frame 1504, correspondingto times t, t+1, and t+2, respectively. Since the number of Direct Modecoded blocks within a sequence can be significant, whereas no residueand motion information are transmitted for such a case, efficiency of Bframes can be considerably increased. Run-length coding (for example, ifthe Universal Variable Length Code (UVLC) entropy coding is used) mayalso be used to improve performance even further.

[0121] Unfortunately, the current JVT standard does not clarify how themotion vector prediction of blocks adjacent to Direct Mode blocks shouldbe performed. As it appears from the current software, Direct Modeblocks are currently considered as having a “different reference frame”thus no spatial correlation is exploited in such a case. This couldconsiderably reduce the efficiency of the prediction, but could alsopotentially affect the performance of error concealment algorithmsapplied on B frames in case such is needed.

[0122] By way of example, if one would like to predict the motion vectorof E in the current codec, if A, B, C, and D were all Direct Mode coded,then the predictor will be set as (0,0) which would not be a gooddecision.

[0123] In FIG. 16, for example, E is predicted from A, B, C, and D.Thus, if A, B, C, or D are Direct Mode coded then their actual valuesare not currently used in the prediction. This can be modified, however.Thus, for example, if A, B, C, or D are Direct Mode coded, then actualvalues of Motion Vectors and reference frames can be used in theprediction. This provides two selectable options: (1) if collocatedmacroblock/block in subsequent P frame is intra coded then a referenceframe is set to −1; (2) if collocated macroblock/block in subsequent Pframe is intra coded then assume reference frame is 0.

[0124] In accordance with certain aspects of the present invention,instead one may use the actual Motion information available from theDirect Mode coded blocks, for performing the motion vector prediction.This will enable a higher correlation of the motion vectors within a Bframe sequence, and thus can lead to improved efficiency.

[0125] One possible issue is how to appropriately handle Direct ModeMacroblocks for which, the collocated block/macroblock in the subsequentframe was intra coded. Here, for example, two possible options include:

[0126] (1) Consider this macroblock/block as having a differentreference frame, thus do not use it in the motion vector prediction; and

[0127] (2) Consider this macroblock as having (0, 0) motion vector andreference frame 0.

[0128] In accordance with certain other exemplary implementations of thepresent invention, a further modification can be made in the de-blockingfilter process. For the Direct Mode case, a de-blocking filter processcan be configured to compare stored motion vector information that istaken from Direct Mode coded blocks—otherwise these would usually beconsidered as zero. In another modification, however, instead one mayconfigure the de-blocking filter process to compare the (exact) motionvectors-regardless of the block type that is used. Thus, in certainimplementations, if for Direct Coded blocks no residue is transmitted, a“stronger” de-blocking filter can provide further improved performance.

[0129] Furthermore, in certain other implementations, the RateDistortion Decision for B frames can be redesigned since it is quitelikely that for certain implementations of the motion vector predictionscheme, a different langrangian parameter λ used in Rate DistortionOptimization decisions, may lead to further coding efficiency. Such λcan be taken, for example, as:

λ=0.85×2^(Qp/3)

[0130] Inter Mode Decision Refinement:

[0131] The JVT standard currently has an overwhelming performanceadvantage versus most other current Block Based coding standards. Partof this performance can be attributed in the possibility of usingvariable block sizes raging from 16×16 down to 4×4 (pixels), instead ofhaving fixed block sizes Doing so, for example, allows for a moreeffective exploitation of temporal correlation. Unfortunately, it hasbeen found that, due to the Mode Decision techniques currently existingin conventional coding logic (e.g., hardware, firmware, and/orsoftware), mode decisions might not be optimally performed, thus wastingbits that could be better allocated.

[0132] In this section, further methods and apparatuses are providedthat at least partly solve this problem and/or others. Here, theexemplary methods and apparatuses have been configured for use with atleast 16×8 and 8×16 (pixel) block modes. Furthermore, using a relativelysimple solution where at least one additional criterion is introduced, asaving of between approximately 5% and 10% is provided in the complexityof the encoder.

[0133] Two key features of the JVT standard are variable macroblock modeselection and Rate Distortion Optimization. A 16×16 (pixel) macroblockcan be coded using different partitioning modes for which motioninformation is also transmitted. The selection of the mode to be usedcan be performed in the Rate Distortion Optimization phase of theencoding where a joint decision of best possible quality at bestpossible bitrate is attempted. Unfortunately, since the assignments ofthe best possible motion information for each subpartition is done in anentirely different process of the encoding, it is possible in somecases, that a non 16×16 mode (e.g. 16×8 or 8×16 (pixel)) carries motioninformation that is equivalent to a 16×16 macroblock. Since the motionpredictors used for each mode could also be different, it is quitepossible in many cases that such 16×16 type motion information could bedifferent from the one assigned to the 16×16 mode. Furthermore, undercertain conditions, the Rate Distortion Optimization may in the enddecide to use the non 16×16 macroblock type, even though it continues16×16 motion information, without examining whether such could have beenbetter if coded using a 16×16 mode.

[0134] Recognizing this, an exemplary system can be configured todetermine when such a case occurs, such that improved performance may beachieved. In accordance with certain exemplary implementations of thepresent invention, two additional modes, e.g., referred to as P2to1 andP3to1, are made available within the Mode decision process/phase. TheP2to1 and P3to1 modes are enabled when the motion information of a 16×8and 8×16 subpartitioning, respectively, is equivalent to that of a 16×16mode.

[0135] In certain implementations all motion vectors and reference frameassigned to each partition may be equal. As such, the equivalent modecan be enabled and examined during a rate distortion process/phase.Since the residue and distortion information will not likely changecompared to the subpartition case, they can be reused withoutsignificantly increasing computation.

[0136] Considering though that the Rate Distortion Mode Decision is notperfect, it is possible that the addition and consideration of these twoadditional modes regardless of the current best mode may, in somelimited cases, reduce the efficiency instead of improving it. As analternative, one may enable these modes only when the correspondingsubpartitioning mode was also the best possible one according to theMode decision employed. Doing so may yield improvements (e.g., bitratereduction) versus the other logic (e.g., codecs, etc.), while notaffecting the PSNR.

[0137] If the motion information of the 16×8 or 8×16 subpartitioning isequivalent to that of the 16×16 mode, then performing mode decision forsuch a mode may be unnecessary. For example, if the motion vectorpredictor of the first subpartition is exactly the same as the motionvector predictor of the 16×16 mode performing mode decision isunnecessary. If such condition is satisfied, one may completely skipthis mode during the Mode Decision process. Doing so can significantlyreduce complexity since it would not be necessary, for this mode, toperform DCT, Quantization, and/or other like Rate Distortionprocesses/measurements, which tend to be rather costly during theencoding process.

[0138] In certain other exemplary implementations, the entire processcan be further extended to a Tree-structured macroblock partition aswell. See, e.g., Heiko Schwarz and Thomas Wiegand, “Tree-structuredmacroblock partition”, Doc. VCEG-N17, December 2001.

[0139] An Exemplary Algorithm

[0140] Below are certain acts that can be performed to provide a moderefinement in an exemplary codec or other like logic (note that incertain other implementations, the order of the act may be changedand/or that certain acts may be performed together):

[0141] Act 1: Set Valid[P2to1]=Valid[P3to1]=0.

[0142] Act 2: Perform Motion Vector and Reference frame decision foreach possible Inter Mode. Let {right arrow over (MV)}_(16×16), {rightarrow over (MVP)}_(16×16), and refframe_(16×16) be the motion vector,motion vector predictor, and reference frame of the 16×16 mode, {{rightarrow over (MV^(a))}_(16×8),{right arrow over (MV^(b))}_(16×8)}, {{rightarrow over (MVP^(a))}_(16×8),{right arrow over (MVP^(b))}_(16×8)}, and{refframe_(16×8) ^(a), refframe_(16×8) ^(b)} the correspondinginformation for the 16×8 mode, and {{right arrow over(MV^(a))}_(8×16),{right arrow over (MV^(b))}_(8×16)}, {{right arrow over(MVP^(a))}_(8×16),{right arrow over (MVP^(b))}_(8×16)}, and{refframe_(8×16) ^(a),refframe_(8×16) ^(b)} for the 8×16 mode.

[0143] Act 3: If ({right arrow over (MV^(a))}_(16×8)!={right arrow over(MV^(b))}_(16×8))OR(refframe_(16×8) ^(a)!=refframe_(16×8) ^(b)) and gotoAct 7.

[0144] Act 4: If ({right arrow over (MV^(a))}_(16×8)!={right arrow over(MV)}_(16×16))OR({right arrow over (MVP^(a))}_(16×8)!={right arrow over(MVP)}_(16×16))OR(refframe_(16×8) ^(a)!=refframe_(16×16)), then goto Act6.

[0145] Act 5: Valid[16×8]=0; goto Act 7 (e.g., Disable 16×8 mode ifidentical to 16×16. Complexity reduction).

[0146] Act 6: Valid[P2to1]=1; (e.g., Enable refinement mode for 16×8){right arrow over (MV)}_(P2to1)={right arrow over (MV^(a))}_(16×8);refframe_(P2to1)=refframe_(16×8) ^(a);

[0147] Act 7: If ({right arrow over (MV^(a))}_(8×16)!={right arrow over(MV^(b))}_(8×16))OR(refframe_(8×16) ^(a)!=refframe_(8×16) ^(a)), thengoto Act 11.

[0148] Act 8: If ({right arrow over (MV^(a))}_(8×16)!={right arrow over(MV)}_(16×16))OR({right arrow over (MVP^(a))}_(8×16)!={right arrow over(MVP)}_(16×16))OR(refframe_(8×16) ^(a)!=refframe_(16×16)) then goto Act10.

[0149] Act 9: Valid[8×16]=0; goto Act 11 (e.g., Disable 8×16 mode ifidentical to 16×16 to reduce complexity)

[0150] Act 10: Valid[P3to1]=1 (e.g., enable refinement mode for 8×16){right arrow over (MV)}_(P3to1)={right arrow over(MV^(a))}_(8×16);refframe_(P3to1)=refframe_(8×16) ^(a);

[0151] Act 11: Perform Rate Distortion Optimization for all Inter &Intra modes if (Valid[MODE]=1) where MODE ε{INTRA4×4, INTRA16×16,SKIP,16×16, 16×8, 8×16, P8×8}, using the langrangian functional:

[0152] J(s C,MODE|QP, λ_(MODE))=SSD(s,c,MODE|QP)+λ_(MODE)·R(s,c,MODE|QP) ActSet best mode to BestMode

[0153] Act 12: If (BestMode!=16×8) then Valid[P3to1]=0 (note that thisact is optional).

[0154] Act 13 If (BestMode!=8×16) then Valid[P2to1]=0 (note that thisact is optional).

[0155] Act 14: Perform Rate Distortion Optimization for the twoadditional modes if (Valid[MODE]=1) where MODE ε{P2to1,P3to1} (e.g.,modes are considered equivalent to 16×16 modes).

[0156] Act 15: Set BestMode to the overall best mode found.

[0157] Applying Exemplary Direct Prediction Techniques For InterlaceCoding:

[0158] Due to the increased interest of interlaced video coding insidethe H.26L standard, several proposals have been presented on enhancingthe encoding performance of interlaced sequences. In this sectiontechniques are presented that can be implemented in the current syntaxof H.26L, and/or other like systems. These exemplary techniques canprovide performance enhancement. Furthermore, Direct P Predictiontechnology is introduced, similar to Direct B Prediction, which can beapplied in both interlaced and progressive video coding.

[0159] Further Information On Exemplary Direct P Prediction Techniques:

[0160] Direct Mode of motion vectors inside B-frames can significantlybenefit encoding performance since it can considerably reduce the bitsrequired for motion vector encoding, especially considering that up totwo motion vectors have to be transmitted. If, though, a block is codedusing Direct Mode, no motion vectors are necessary where as insteadthese are calculated as temporal interpolations of the motion vectors ofthe collocated blocks in the first subsequent reference image. A similarapproach for P frames appears to have never been considered since thestructure of P frames and of their corresponding macroblock was muchsimpler, while each macroblock required only one motion vector. Addingsuch a mode would have instead, most likely, incurred a significantoverhead, thus possibly negating any possible gain.

[0161] In H.26L on the other hand, P frames were significantly enhancedby introducing several additional macroblock Modes. As describedpreviously, in many cases it might even be necessary to transmit up to16 motion vectors per macroblock. Considering this additional ModeOverhead that P frames in H.26L may contain, an implementation of DirectPrediction of the motion vectors could is be viable. In such a way, allbits for the motion vectors and for the reference frame used can besaved at only the cost of the additional mode, for example, see FIG. 4.

[0162] Even though a more straightforward method of Direct P predictionis to select the Motion vectors of the collocated pixels in the previousframe, in other implementations one may also consider MotionAcceleration as an alternative solution. This comes from the fact thatmaybe motion is changing frame by frame, it is not constant, and byusing acceleration better results could be obtained, for example, seeFIG. 7.

[0163] Such techniques can be further applied to progressive videocoding. Still, considering the correlation that fields may have in somecases inside interlace sequences, such as for example regions withconstant horizontal only movement, this approach can also help improvecoding efficiency for interlace sequence coding. This is in particularbeneficial for known field type frames, for example, if it is assumedthat the motion of adjacent fields is the same. In this type ofarrangement, same parity fields can be considered as new frames and aresequentially coded without taking consideration of the interlacefeature. Such is entirely left on the decoder. By using this exemplaryDirect P mode though, one can use one set of motion vectors for thefirst to be coded field macroblock (e.g., of size 16×16 pixels) where asthe second field at the same location is reusing the same motioninformation. The only other information necessary to be sent is thecoded residue image. In other implementations, it is possible to furtherimprove upon these techniques by considering correlations between theresidue images of the two collocated field Blocks.

[0164] In order to allow Direct Mode in P frames, it is basicallynecessary to add one additional Inter Mode into the system. Thus,instead of having only 8 Inter Modes, in one example, one can now use 9which are shown below: INTER MODES Description COPY_MB 0 Skip macroblockMode M16×16_MB 1 One 16 × 16 block M16×8_MB 2 Two 16 × 8 blocks M8×16_MB3 Two 8 × 16 blocks M8×8_MB 4 Four 8 × 8 blocks M8×4_MB 5 Eight 8 × 4blocks M4×8_MB 6 Eight 4 × 8 blocks M4×4_MB 7 Sixteen 16 × 8 blocksPDIRECT_MB 8 Copy Mode and motion vectors of collocated macroblock inprevious frame

[0165] In general, such exemplary Direct Modes for P frames can appearif the collocated macroblock was also of INTER type, except Skipmacroblock, but including Direct Mode, since in other cases there is nomotion information that could be used. In the case of the previousmacroblock also being coded in Direct P Mode, the most recent MotionVectors and Mode for this macroblock are considered instead. To moreefficiently though handle the cases that this Mode will not logicallyappear, and in particular if INTRA mode was used, one may select ofallowing this Mode to also appear in such cases with the Mode nowsignifying a second Skip Macroblock Mode where a copy the information isnot from the previous frame, but from the one before it. In this case,no residue information is encoded. This is particularly useful forInterlace sequences, since it is more likely that a macroblock can befound with higher accuracy from the same parity field frame, and notfrom the previously coded field frame as was presented in previoustechniques.

[0166] For further improved efficiency, if a set of two Field typeframes is used when coding interlace images, the Skip Macroblock Modecan be configured to use the same parity field images. If Direct P modeis used as a skipping flag, for example, then the different parity isused instead. An additional benefit of Direct P mode, is that it mayallow for a significant complexity reduction in the encoder since it ispossible to allow the system to perform a pre-check to whether theDirect P mode gives a satisfactory enough solution, and if so, noadditional computation may be necessary for the mode decision and motionestimation of that particular block. To also address the issue of motionvector coding, the motion vectors used for Direct P coding can be used“as is” for the calculation of a MEDIAN predictor.

[0167] Best Field First technique & Field Reshuffling:

[0168] Coding of interlaced sequence allowing support of both interlaceframe material, and separate interlace field images inside the samestream would likely provide a much better solution than coding usingonly one of the two methods. The separate interlace field technique hassome additional benefits, such as, for example, de-blocking, and inparticular can provide enhanced error resilience. If an error happensinside one field image, for example, the error can be easily consumedusing the information from the second image.

[0169] This is not the case for the frame based technique, whereespecially when considering the often large size of and bits used bysuch frames, errors inside such a frame can happen with much higherprobability. Reduced correlation between pixels/blocks may not promoteerror recovery.

[0170] Here, one can further improve on the field/frame coding conceptby allowing the encoder to select which field should be encoded first,while disregarding which field is to be displayed first. This could behandled automatically on a decoder where a larger buffer will be neededfor storing a future field frame before displaying it. For example, eventhough the top field precedes the bottom field in terms of time, thecoding efficiency might be higher if the bottom field is coded andtransmitted first, followed by the top field frame. The decision may bemade, for example, in the Rate Distortion Optimization process/phase,where one first examines what will the performance be if the Odd fieldis coded first followed by the Even field, and of the performance if theEven field is instead coded and is used as a reference for the Oddfield. Such a method implies that both the encoder and the decodershould know which field should be displayed first, and any reshufflingdone seamlessly. It is also important that even though the Odd field wascoded first, both encoder and decoder are aware of this change whenindexing the frame for the purpose of INTER/INTRA prediction.Illustrative examples of such a prediction scheme, using 4 referenceframes, are depicted in FIG. 17 and FIG. 18. In FIG. 17, interlacecoding is shown using an exemplary Best Field First scheme in P frames.In FIG. 18, interlace coding is shown using a Best Field First scheme inB frames.

[0171] In the case of coding joint field/frame images, the schemeillustratively depicted in FIG. 19 may be employed. Here, an exemplaryimplementation of a Best Field First scheme with frame and field basedcoding is shown. If two frames are used for the frame based motionestimation, then at least five field frames-can be used for motionestimation of the fields, especially if field swapping occurs. Thisallows referencing of at least two field frames of the same parity. Ingeneral 2×N+1 field frames should be stored if N full frames are to beused. Frames also could easily be interleaved and deinterleaved on theencoder and decoder for such processes.

[0172] Conclusion

[0173] Although the description above uses language that is specific tostructural features and/or methodological acts, it is to be understoodthat the invention defined in the appended claims is not limited to thespecific features or acts described. Rather, the specific features andacts are disclosed as exemplary forms of implementing the invention.

What is claimed is:
 1. A method for use in encoding video data within asequence of video frames, the method comprising; encoding at least aportion of at least one reference frame to include motion informationassociated with said portion of said reference frame; defining at leasta portion of at least one predictable frame that includes video datapredictively correlated to said portion of said reference frame based onsaid motion information; and encoding at least said portion of saidpredictable frame without including corresponding motion information andincluding mode identifying data that identifies that said portion ofsaid predictable frame can be directly derived using at least saidmotion information associated with said portion of said reference frame.2. The method as recited in claim 1, wherein said mode identifying datadefines a type of prediction model required to decode said encodedportion of said predictable frame.
 3. The method as recited in claim 2,wherein said type of prediction model includes an enhanced DirectPrediction model that includes at least one submode selected from agroup comprising a Motion Projection submode, a Spatial Motion VectorPrediction submode, and a weighted average submode.
 4. The method asrecited in claim 3, wherein said mode identifying data identifies saidat least one submode.
 5. The method as recited in claim 1, wherein saidmethod generates a plurality of video frames comprising at least onepredictable frame selected from a group of predictable frames comprisinga P frame and a B frame.
 6. The method as recited in claim 1, whereinsaid portion of said reference frame includes data for at least onepixel within said reference frame, and said portion of said predictableframe includes data for at least one pixel within said predictableframe.
 7. The method as recited in claim 6, wherein said portion of saidreference frame includes data for at least a portion of a macroblockwithin said reference frame, and said portion of said predictable frameincludes data for at least a portion of a macroblock within saidpredictable frame.
 8. The method as recited in claim 1, wherein saidreference frame temporally precedes said predictable frame within saidsequence of video frames.
 9. The method as recited in claim 1, whereinsaid motion information associated with said portion of said referenceframe includes velocity information.
 10. The method as recited in claim1, wherein said motion information associated with said portion of saidreference frame includes acceleration information.
 11. The method asrecited in claim 1, wherein said portion of said reference frame andsaid portion of said predictable frame are spatially correlated.
 12. Themethod as recited in claim 1, wherein said motion information associatedwith said portion of said reference frame includes Pixel Projectioninformation required to decode said encoded portion of said predictableframe.
 13. The method as recited in claim 1, wherein said motioninformation associated with said portion of said reference frameincludes Spatial Motion Vector Prediction information required to decodesaid encoded portion of said predictable frame.
 14. The method asrecited in claim 1, wherein said motion information associated with saidportion of said reference frame includes combined Pixel Projection andSpatial Motion Vector Prediction information required to decode saidencoded portion of said predictable frame.
 15. The method as recited inclaim 1, wherein said motion information associated with said portion ofsaid reference frame includes multi-hypothesis prediction informationrequired to decode said encoded portion of said predictable frame. 16.The method as recited in claim 1, wherein said motion informationassociated with said portion of said reference frame is null and saidmode identifying data identifies that said portion of said referenceframe includes said portion of said predictable.
 17. The method asrecited in claim 1, wherein said motion information associated with saidportion of said reference frame includes corresponding residueinformation.
 18. The method as recited in claim 1, wherein said motioninformation associated with said portion of said reference frameincludes corresponding residue information only if a quantizationparameter (QP) meets at least one defined condition.
 19. The method asrecited in claim 18, wherein said at least one defined conditionincludes a threshold value.
 20. The method as recited in claim 18,wherein said a threshold value is about QP>twenty-three.
 21. The methodas recited in claim 1, wherein said at least one predictable frame andreference frame are part of an interlaced sequence of video fields. 22.The method as recited in claim 21, wherein motion information isassociated with at least one collocated pixel in said reference frame.23. The method as recited in claim 22, wherein encoding at least aportion of said at least one reference frame further includes encodingbased on a correlation between residue images of two collocated fieldblocks.
 23. The method as recited in claim 21, further comprising foreach of said reference frame and said predictable frame selecting anorder in which fields within said interlaced sequence of video fieldsare to be encoded.
 24. The method as recited in claim 21, wherein saidat least one predictable frame and reference frame each have at leasttwo fields associated with them.
 25. The method as recited in claim 1,further comprising: selectively determining if a direct prediction modeis used instead of a pre-existing mode during said encoding of said atleast said portion of said predictable frame based on at least onefactor.
 26. A computer-readable medium having computer-implementableinstructions for performing acts comprising: encoding video data for asequence of video frames into at least one predictable frame selectedfrom a group of predictable frames comprising a P frame and a B frame,by: encoding at least a portion of at least one reference frame toinclude motion information associated with said portion of saidreference frame; defining at least a portion of at least one predictableframe that includes video data predictively correlated to said portionof said reference frame based on said motion information; and encodingat least said portion of said predictable frame without includingcorresponding motion information and including mode identifying datathat identifies that said portion of said predictable frame can bedirectly derived using at least said motion information associated withsaid portion of said reference frame.
 27. The computer-readable mediumas recited in claim 26, wherein said mode identifying data defines atype of prediction model required to decode said encoded portion of saidpredictable frame.
 28. The computer-readable medium as recited in claim27, wherein said type of prediction model includes an enhanced DirectPrediction model that includes at least one submode selected from agroup comprising a Motion Projection submode, a Spatial Motion VectorPrediction submode, and a weighted average submode, and wherein saidmode identifying data identifies said at least one submode.
 29. Thecomputer-readable medium as recited in claim 26, wherein said portion ofsaid reference frame includes data for at least one pixel within saidreference frame, and said portion of said predictable frame includesdata for at least one pixel within said predictable frame.
 30. Thecomputer-readable medium as recited in claim 26, wherein said motioninformation associated with said portion of said reference frameincludes information selected from a group comprising velocityinformation and acceleration information.
 31. The computer-readablemedium as recited in claim 26, wherein said motion informationassociated with said portion of said reference frame includesinformation required to decode said encoded portion of said predictableframe that is selected from a group comprising: Pixel Projectioninformation; Spatial Motion Vector Prediction information; WeightedPixel Projection and Spatial Motion Vector Prediction information; andMulti-hypothesis prediction information.
 32. The computer-readablemedium as recited in claim 26, wherein said motion informationassociated with said portion of said reference frame is null and saidmode identifying data identifies that said portion of said referenceframe includes said portion of said predictable.
 33. Thecomputer-readable medium as recited in claim 26, wherein said motioninformation associated with said portion of said reference frameincludes corresponding residue information.
 34. The computer-readablemedium as recited in claim 26, wherein said motion informationassociated with said portion of said reference frame includescorresponding residue information only if a quantization parameter (QP)meets at least one defined condition.
 35. The computer-readable mediumas recited in claim 34, wherein said at least one defined conditionincludes a threshold value.
 36. The computer-readable medium as recitedin claim 26, wherein said at least one predictable frame and referenceframe are part of an interlaced sequence of video fields.
 37. Thecomputer-readable medium as recited in claim 36, wherein motioninformation is associated with at least one collocated pixel in saidreference frame.
 38. The computer-readable medium as recited in claim37, wherein encoding at least a portion of said at least one referenceframe further includes encoding based on a correlation between residueimages of two collocated field blocks.
 39. The computer-readable mediumas recited in claim 36, further comprising for each of said referenceframe and said predictable frame selecting an order in which fieldswithin said interlaced sequence of video fields are to be encoded. 40.The computer-readable medium as recited in claim 36, wherein said atleast one predictable frame and, reference frame each have at least twofields associated with them.
 41. The computer-readable medium as recitedin claim 26, having computer-implementable instructions for performingfurther acts comprising: selectively determining if a direct predictionmode is used instead of a pre-existing mode during said encoding of saidat least said portion of said predictable frame based on at least onefactor.
 42. An apparatus for use in encoding video data for a sequenceof video frames into a plurality of video frames including at least onepredictable frame selected from a group of predictable frames comprisinga P frame and a B frame, said apparatus comprising: memory for storingmotion information; and logic operatively coupled to said memory andconfigured to encode at least a portion of at least one reference frameto include motion information associated with said portion of saidreference frame, determine at least a portion of at least onepredictable frame that includes video data predictively correlated tosaid portion of said reference frame based on said motion information,and encode at least said portion of said predictable frame withoutincluding corresponding motion information and including modeidentifying data that identifies that said portion of said predictableframe can be directly derived using at least said motion informationassociated with said portion of said reference frame.
 43. The apparatusas recited in claim 42, wherein said mode identifying data defines atype of prediction model required to decode said encoded portion of saidpredictable frame.
 44. The apparatus as recited in claim 43, whereinsaid type of prediction model includes an enhanced Direct Predictionmodel that includes at least one submode selected from a groupcomprising a Motion Projection submode, a Spatial Motion VectorPrediction submode, and a weighted average submode, and wherein saidmode identifying data identifies said at least one submode.
 45. Theapparatus as recited in claim 42, wherein said portion of said referenceframe includes data for at least one pixel within said reference frame,and said portion of said predictable frame includes data for at leastone pixel within said predictable frame.
 46. The apparatus as recited inclaim 42, wherein said motion information associated with said portionof said reference frame includes information selected from a groupcomprising velocity information and acceleration information.
 47. Theapparatus as recited in claim 42, wherein said motion informationassociated with said portion of said reference frame includesinformation required to decode said encoded portion of said predictableframe that is selected from a group comprising: Pixel Projectioninformation; Spatial Motion Vector Prediction information; WeightedPixel Projection and Spatial Motion Vector Prediction information; andMulti-hypothesis prediction information.
 48. The apparatus as recited inclaim 42, wherein said motion information associated with said portionof said reference frame is null and said mode identifying dataidentifies that said portion of said reference frame includes saidportion of said predictable.
 49. The apparatus as recited in claim 42,wherein said motion information associated with said portion of saidreference frame includes corresponding residue information.
 50. Theapparatus as recited in claim 42, wherein said motion informationassociated with said portion of said reference frame includescorresponding residue information only if a quantization parameter (QP)meets at least one defined condition.
 51. The apparatus as recited inclaim 42, wherein said at least one predictable frame and referenceframe are part of an interlaced sequence of video fields.
 52. Theapparatus as recited in claim 51, wherein motion information isassociated with at least one collocated pixel in said reference frame.53. The apparatus as recited in claim 52, wherein said logic encodessaid at least a portion of said at least one reference frame based on acorrelation between residue images of two collocated field blocks. 54.The apparatus as recited in claim 51, wherein said logic is furtherconfigured to, for each of said reference frame and said predictableframe, select an order in which fields within said interlaced sequenceof video fields are encoded.
 55. The apparatus as recited in claim 51,wherein said at least one predictable frame and reference frame eachhave at least two fields associated with them.
 56. The apparatus asrecited in claim 42, wherein said logic is further configured toselectively determine if a direct prediction mode is used instead of apre-existing mode when encoding said at least said portion of saidpredictable frame based on at least one factor.
 57. A method for use indecoding encoded video data that includes a plurality of video framescomprising at least one predictable frame selected from a group ofpredictable frames comprising a P frame and a B frame, the methodcomprising; determining motion information associated with at least aportion of at least one reference frame; buffering said motioninformation; determining mode identifying data that identifies that atleast a portion of a predictable frame can be directly derived using atleast said buffered motion information; and generating said portion ofsaid predictable frame using said buffered motion information.
 58. Themethod as recited in claim 57, wherein said mode identifying datadefines a type of prediction model required to decode said encodedportion of said predictable frame.
 59. The method as recited in claim58, wherein said type of prediction model includes an enhanced DirectPrediction model that includes at least one submode selected from agroup comprising a Motion Projection submode, a Spatial Motion VectorPrediction submode, and a weighted average submode.
 60. The method asrecited in claim 59, wherein said mode identifying data identifies saidat least one submode.
 61. The method as recited in claim 57, whereinsaid portion of said reference frame includes data for at least onepixel within said reference frame, and said portion of said predictableframe includes data for at least one pixel within said predictableframe.
 62. The method as recited in claim 61, wherein said portion ofsaid reference frame includes data for at least a portion of amacroblock within said g reference frame, and said portion of saidpredictable frame includes data for at least a portion of a macroblockwithin said predictable frame.
 63. The method as recited in claim 57,wherein said reference frame temporally precedes said predictable framewithin said sequence of video frames.
 64. The method as recited in claim57, wherein said motion information associated with said portion of saidreference frame includes information selected from a group comprisingvelocity information and acceleration information.
 65. The method asrecited in claim 57, wherein said portion of said reference frame andsaid portion of said predictable frame are spatially correlated.
 66. Themethod as recited in claim 57, wherein said motion informationassociated with said portion of said reference frame includesinformation selected from a group comprising Pixel Projectioninformation, Spatial Motion Vector Prediction, combined Pixel Projectionand Spatial Motion Vector Prediction information, and multi-hypothesisprediction information.
 67. The method as recited in claim 57, whereinsaid motion information associated with said portion of said referenceframe is null and said mode identifying data identifies that saidportion of said reference frame includes said portion of saidpredictable frame.
 68. A computer-readable medium havingcomputer-implementable instructions for performing acts comprising:decoding encoded video data that includes a plurality of video framescomprising at least one predictable frame selected from a group ofpredictable frames comprising a P frame and a B frame, by: bufferingmotion information associated with at least a portion of at least onereference frame; determining mode identifying data that identifies thatat least a portion of a predictable frame can be directly derived usingat least said buffered motion information; and generating said portionof said predictable frame using said buffered motion information. 69.The computer-readable medium as recited in claim 68, wherein said modeidentifying data defines a type of prediction model required to decodesaid encoded portion of said predictable frame.
 70. Thecomputer-readable medium as recited in claim 69, wherein said type ofprediction model includes an enhanced Direct Prediction model thatincludes at least one submode selected from a group comprising a MotionProjection submode, a Spatial Motion Vector Prediction submode, and aweighted average submode.
 71. The computer-readable medium as recited inclaim 70, wherein said mode identifying data identifies said at leastone submode.
 72. The computer-readable medium as recited in claim 68,wherein said portion of said reference frame includes data for at leastone pixel within said reference frame, and said portion of saidpredictable frame includes data for at least one pixel within saidpredictable frame.
 73. The computer-readable medium as recited in claim72, wherein said portion of said reference frame includes data for atleast a portion of a macroblock within said reference frame, and saidportion of said predictable frame includes data for at least a portionof a macroblock within said predictable frame.
 74. The computer-readablemedium as recited in claim 68, wherein said reference frame temporallyprecedes said predictable frame within said sequence of video frames.75. The computer-readable medium as recited in claim 68, wherein saidmotion information associated with said portion of said reference frameincludes information selected from a group comprising velocityinformation and acceleration information.
 76. The computer-readablemedium as recited in claim 68, wherein said portion of said referenceframe and said portion of said predictable frame are spatiallycorrelated.
 77. The computer-readable medium as recited in claim 68,wherein said motion information associated with said portion of saidreference frame includes information selected from a group comprisingPixel Projection information, Spatial Motion Vector Prediction, combinedPixel Projection and Spatial Motion Vector Prediction information, andmulti-hypothesis prediction information.
 78. The computer-readablemedium as recited in claim 68, wherein said motion informationassociated with said portion of said reference frame is null and saidmode identifying data identifies that said portion of said referenceframe includes said portion of said predictable frame.
 79. An apparatusfor use in decoding video data for a sequence of video frames into aplurality of video frames including at least one predictable frameselected from a group of predictable frames comprising a P frame and a Bframe, said apparatus comprising: memory for storing motion information;and logic operatively coupled to said memory and configured to buffer insaid memory motion information associated with at least a portion of atleast one reference frame, ascertain mode identifying data thatidentifies that at least a portion of a predictable frame can bedirectly derived using at least said buffered motion information, andgenerate said portion of said predictable frame using said bufferedmotion information.
 80. The apparatus as recited in claim 79, whereinsaid mode identifying data defines a type of prediction model requiredto decode said encoded portion of said predictable frame.
 81. Theapparatus as recited in claim 80, wherein said type of prediction modelincludes an enhanced Direct Prediction model that includes at least onesubmode selected from a group comprising a Motion Projection submode, aSpatial Motion Vector Prediction submode, and a weighted averagesubmode.
 82. The apparatus as recited in claim 81, wherein said modeidentifying data identifies said at least one submode.
 83. The apparatusas recited in claim 79, wherein said portion of said reference frameincludes data for at least one pixel within said reference frame, andsaid portion of said predictable frame includes data for at least onepixel within said predictable frame.
 84. The apparatus as recited inclaim 83, wherein said portion of said reference frame includes data forat least a portion of a macroblock within said reference frame, and saidportion of said predictable frame includes data for at least a portionof a macroblock within said predictable frame.
 85. The apparatus asrecited in claim 79, wherein said reference frame temporally precedessaid predictable frame within said sequence of video frames.
 86. Theapparatus as recited in claim 79, wherein said motion informationassociated with said portion of said reference frame includesinformation selected from a group comprising velocity information andacceleration information.
 87. The apparatus as recited in claim 79,wherein said portion of said reference frame and said portion of saidpredictable frame are spatially correlated.
 88. The apparatus as recitedin claim 79, wherein said motion information associated with saidportion of said reference frame includes information selected from agroup comprising Pixel Projection information, Spatial Motion VectorPrediction, combined Pixel Projection and Spatial Motion VectorPrediction information, and multi-hypothesis prediction information. 89.The apparatus as recited in claim 79, wherein said motion informationassociated with said portion of said reference frame is null and saidmode identifying data identifies that said portion of said referenceframe includes said portion of said predictable frame.
 90. The apparatusas recited in claim 79, wherein said logic includes a codec.